VTLN based on the linear interpolation of contiguous mel filter-bank energies
نویسندگان
چکیده
This paper describes a novel feature-space VTLN method that models frequency warping as a linear interpolation of contiguous Mel filter-bank energies. The presented technique aims to reduce the distortion in the Mel filter-bank energy estimation due to the harmonic composition of voiced speech intervals and DFT sampling when the central frequency of band-pass filters is shifted. The presented interpolated filterbank energy-based VTLN leads to relative reductions in WER as high as 11.2% and 7.6% when compared with the baseline system and standard VTLN, respectively, in a mediumvocabulary continuous speech recognition task. Also, this new scheme provides significant reductions in WER equal to 7% when compared with state-of-the-art VTLN methods based on linear transforms in the cepstral space. The warping factor estimated here shows more dependence on the speaker and more independence of the acoustic-phonetic content than the warping factor in state-of-the-art VTLN techniques.
منابع مشابه
VTLN in the MFCC Domain: Band-Limited versus Local Interpolation
We propose a new easy-to-implement method to compute a Linear Transform (LT) to perform Vocal Tract Length Normalization (VTLN) on truncatedMel Frequency Cepstral Coefficients (MFCCs) normally used in distributed speech recognition. The method is based on a Local Interpolation which is independent of the Mel filter design. Local Interpolation (LILT) VTLN is theoretically and experimentally comp...
متن کاملRevisiting VTLN using linear transformation on conventional MFCC
In this paper, we revisit the linear transformation for VTLN on conventional MFCC proposed by Sanand et al. in [1], using the idea of band-limited interpolation. The filter-bank is modified to include half-filters at zero and nyquist frequencies, as the full symmetric spectrum is required for performing bandlimited interpolation. In this paper, we show that the filter-bank with half-filters doe...
متن کاملFrequency warping by linear transformation of standard MFCC
A novel linear transform (LT) is proposed for frequency warping (FW) with standard filterbank based MFCC features. Here, we use the idea of spectral interpolation of [9] to perform a continuous warping in the log filterbank output domain, and incorporate both interpolation and warping into a single warped IDCT matrix. The new transformation matrix is thus mathematically simpler than in [9], and...
متن کاملLinear transformation approach to VTLN using dynamic frequency warping
In the paper, we present a novel linear transformation approach to frequency warping during vocal tract length normalisation(VTLN) using the idea of dynamic frequency warping(DFW). Linear transformation among the mel-frequency cepstral coefficients (MFCC) provides computational advantage of not having to recompute features for each warp factor in VTLN. The proposed method uses the idea of separ...
متن کاملImplementing frequency-warping and VTLN through linear transformation of conventional MFCC
In this paper, we show that frequency-warping (including VTLN) can be implemented through linear transformation of conventional MFCC. Unlike the Pitz-Ney [1] continuous domain approach, we directly determine the relation between frequency-warping and the linear-transformation in the discrete-domain. The advantage of such an approach is that it can be applied to any frequency-warping and is not ...
متن کامل